A probabilistic approach to sequence assembly validation
نویسندگان
چکیده
ABSTRACT Sequence assembly is an essential requirement for determining the complete sequence of long DNA. However, sequence assembly programs often generate misassembled contigs by either joining di erent repeat copies, resulting in joining non contiguous DNA regions (inverted or swapped) or by including many fragments from di erent repeat copies resulting in errors in the consensus sequence (noisy regions). Usually, sequence assemblies are experimentally validated. While this is the most reliable approach, it is time consuming and labor intensive. In this paper, we propose a probabilistic approach to identify possible misassembled regions in shotgun sequence assemblies. Based on the statistics using a set of randomly sampled patterns from shotgun data, a probability model that measures each fragment's contribution to misassembly is proposed. From the probability model, we compute entropy at each base position in contig assembly. Our approach correctly identi ed all misassembled regions in the assembly of the Mycoplasma genitalium genome from real shotgun sequence data. Furthermore, using this approach we identi ed many putative misassembled regions in the assemblies of bacterial genomes we are currently sequencing.
منابع مشابه
A Multi-objective Mixed Model Two-sided Assembly Line Sequencing Problem in a Make –To- Order Environment with Customer Order Prioritization
Mixed model two-sided assembly lines (MM2SAL) are applied to assemble large product models, which is produced in high-volume. So, the sequence planning of products to reduce cost and increase productivity in this kind of lines is imperative. The presented problem is tackled in two steps. In step 1, a framework is developed to select and prioritize customer orders under the finite capacity of th...
متن کاملAssembly line balancing to minimize balancing loss and system loss
Assembly Line production is one of the widely used basic principles in production system. The problem of Assembly Line Balancing deals with the distribution of activities among the workstations so that there will be maximum utilization of human resources and facilities without disturbing the work sequence. Research works reported in the literature mainly deals with minimization of idle time i.e...
متن کاملReduction of production disturbances of a shoemaking industry through a discrete event simulation approach
This study presents a reduction of production disturbances of a shoemaking industry through discrete event simulation approach. The study is conducted at Peacock Shoe factory found in Addis Ababa, Ethiopia. This factory faces line balancing problem that becomes production disturbance for its assembly lines. Detail time study is carried out for the selected shoe model using stopwatch. Assembly ...
متن کاملA two-stage stochastic rule-based model to determine pre-assembly buffer content
This study considers instant decision-making needs of the automobile manufactures for resequencing vehicles before final assembly (FA). We propose a rule-based two-stage stochastic model to determine the number of spare vehicles that should be kept in the pre-assembly buffer to restore the altered sequence due to paint defects and upstream department constraints. First stage of the model decide...
متن کاملModeling the Hybrid Flow Shop Scheduling Problem Followed by an Assembly Stage Considering Aging Effects and Preventive Maintenance Activities
Scheduling problem for the hybrid flow shop scheduling problem (HFSP) followed by an assembly stage considering aging effects additional preventive and maintenance activities is studied in this paper. In this production system, a number of products of different kinds are produced. Each product is assembled with a set of several parts. The first stage is a hybrid flow shop to produce parts. All ...
متن کامل